Text Patterns and Compression Models for Semantic Class Learning

نویسندگان

  • Chung-Yao Chuang
  • Yi-Hsun Lee
  • Wen-Lian Hsu
چکیده

This paper proposes a weakly-supervised approach for extracting instances of semantic classes. This method constructs simple wrappers automatically based on specified seed instances and uses a compression model to assess the contextual evidence of its extraction. By adopting this compression model, our approach can better avoid erroneous extractions in a noisy corpus such as the Web. The empirical results show that our system performs quite consistently even when operating on a noisy text with a lot of possibly irrelevant documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

CS 2980 : Model - based Semantic Compression in Database Project Report

We are now in big data era, advances in data collection and management technologies have led to large databases. For example, domains such as Medicine, Biology, Music and experimental sciences in general, are all characterized by large data sequences. Considering the amount of space the big data required and the amount of IO required to fetch the big data, data compression has become an efficie...

متن کامل

Enriching Text Representation with Frequent Pattern Mining for Probabilistic Topic Modeling

Probabilistic topic models have been proven very useful for many text mining tasks. Although many variants of topic models have been proposed, most existing works are based on the bag-of-words representation of text in which word combination and order are generally ignored, resulting in inaccurate semantic representation of text. In this paper, we propose a general way to go beyond the bag-of-w...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011